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DISCUSSION OF: BROWNIAN DISTANCE COVARIANCE 1 



By Michael R. Kosorok 

University of North Carolina at Chapel Hill 

We discuss briefly the very interesting concept of Brownian dis- 
tance covariance developed by Szekely and Rizzo [Ann. Appl. Statist. 
(2009) , to appear] and describe two possible extensions. The first ex- 
tension is for high dimensional data that can be coerced into a Hilbert 
space, including certain high throughput screening and functional 
data settings. The second extension involves very simple modifica- 
tions that may yield increased power in some settings. We commend 
Szekely and Rizzo for their very interesting work and recognize that 
this general idea has potential to have a large impact on the way in 
which statisticians evaluate dependency in data. 

1. Introduction and assessment. The Brownian distance covariance and 
correlation proposed by Szekely and Rizzo (2009) (abbreviated SR here- 
after) is a very useful and elegant alternative to the standard measures of 
correlation and is based on several deep and nontrivial theoretical calcula- 
tions developed earlier in Szekely, Rizzo and Bakirov (2007) (abbreviated 
SRB hereafter). We congratulate the group on this very original and ele- 
gant work. The main result is that a single, simple statistic V n (X, Y) can be 
used to assess whether two random vectors X and Y, of possibly different 
respective dimensions p and q, are dependent based on an i.i.d. sample. 

The proposed statistic V n (X, Y) estimates an interesting population pa- 
rameter V(X, Y) that the authors demonstrate can also be expressed as the 
covariance between independent Brownian motions W and W, with p and 
q dimensional indices, evaluated at X and Y, respectively. Specifically, let 
W : W. p i— > M be a real valued, tight, mean-zero Gaussian process with covari- 
ance \s\ p + \t\ p — \s — t\ p , for s,t £ MP, where | ■ | r is the standard Euclidean 
norm in W~ . Let W be similarly defined but for indices s,t £M. g and norm 
| • \ q . It can be shown that V{X,Y) = E[W(X)W(X')W'(Y)W'(Y')], where 
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{X\ Y') is an independent copy of (X, Y), and where W and W' are indepen- 
dent of both (X,Y) and (X',Y'). This justifies the designation "Brownian 
distance covariance." 

By replacing Brownian motion with other stochastic processes, a very 
wide array of alternative forms of correlation between vectors X and Y can 
be generated. In the special case where p = q = 1 and the stochastic processes 
W and W are the nonrandom identify functions centered respectively at 



E(X) and E(Y), V n (X,Y) = E[W{X)W(X')W'{Y)W'{Y')] = Cow 2 {X 1 Y) 1 



which is the standard Pearson product-moment covariance squared. Thus, 
the results obtained by SR not only have a profound connection to Brown- 
ian motion, but also include traditional measures of dependence as special 
cases, while, at the same time, having the potential to generate many useful 
new measures of dependence through the use of other stochastic processes 
besides Brownian motion. This raises the very real possibility that a broadly 
applicable and unified theoretical and methodological framework for testing 
dependence could be developed. 

The SR paper is therefore not only important for the specific results con- 
tained therein but also for the possibly far reaching consequences for future 
statistical research in both theory and applications. For the remainder of the 
paper, we describe two possible extensions of these results. The first exten- 
sion is for high dimensional data that can be coerced into a Hilbert space, 
including certain high throughput screening and functional data settings. 
The second extension involves very simple modifications that may yield in- 
creased power in some settings. We first present some initial results and 
consequences of SR and SRB that will prove useful in later developments. 
We then present the Hilbert space extension with a few example applica- 
tions. Some modifications leading to potential variations in power will then 
be described. The paper will then conclude with a brief discussion. 

2. Some initial results. We now present a few initial results which will be 
useful in later sections. For a paired sample of size n, (Xi,Y\), . . . , (X n ,Y n ), 
of realizations of (X, Y) , where X and Y are random variables from arbitrary 
normed spaces with respective norms || • ||x and || • ||y, define, analogously 
to SR, 
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and V n (X, Y) = T 1 +T 2 - 2T 3 . Also define 

T 10 = E[\\X 1 -X 2 \\ X \\Y 1 -Y 2 \\ Y ], 

t 20 = e[\\x x - x 2 \\ x ] x ^oin - y 2 || y ], 

T 30 = E[\\X 1 -X 2 \\ X \\Y 1 -Y 3 \\ Y ], 

and V (X,Y) =T W +T 20 - 2T 30 . Also let V n (X) = V n {X,X) and V (X) = 
V (X,X); and let V n {Y) = V n (Y,Y) and V (Y) = V (Y,Y). This allows us to 
define also R n (X,Y) = V n (X,Y)/y/V n (X)V n (Y) and R (X,Y) = V (X,Y)/ 
y/Vo(X)Vo(Y), provided the denominators are nonzero (and defined to be 
zero otherwise). The main distinction between this and the definitions in SR 
is the use of arbitrary normed spaces. 

Because this has a standard [/-statistic structure, we have the follow- 
ing general result, the proof of which follows from standard theory for U- 
statistics [see, e.g., Chapter 12 of van der Vaart (1998)]: 

Lemma 1. Provided E\\X\\ X < oo and E\\Y\\ Y < oo, then V n (X,Y) — > 
V (X,Y), V n (X) 4 V (X) and V n (Y) 4 V (Y). 

Remark 1 . In the special case where X and Y are from finite-dimensional 
Euclidean spaces, we know from Theorems 1-4 of SR that V n (X,Y), V n (X), 
V n (Y), V Q (X,Y ), V (X) and V ( Y) are all n onnegative; that V n (X,Y) < 
y/V n (x)V n (Y) and V (X, Y) < y/V (X)Vo(Y); that V (X) = or V (Y) = 
only when X or Y is trivial; that V n (X) = or V n (Y) = only when the 
X's or y's in the sample are all identical; that < R n (X,Y),Ro(X,Y) < 1; 
and that V${X, Y) = only when X and Y are independent. 

We now wish to generalize the above results in the finite-dimensional 
context to a class of norms more broad than Euclidean norms. These results 
will be useful for later sections. Let A and B be respectively p x p and 
q x q symmetric, positive definite matrices. Let a "tilde" placed over T\, 
T 2 , T 3 , V n , Vo, etc., denote the quantity obtained by replacing \x\ p with 
H^m^ = V x' Ax and \y\ q with = y/y'By in V n , Vq, etc. For example, 

Ti = n~ 2 Ylkl=i H^ fc — ^IU,pll^fc — ^ 1 1 ■ We now have the following very 
simple extension: 

Lemma 2. Let A and B be symmetric and positive definite. Then V n (X, Y), 
V n (X), V n (Y), Vo(X,Y), Vo(X) and Vo(Y) are all nonnegative; and all of 
the other results in Remark 1 remain true with a "tilde" placed over the 
given quantities. Moreover, Vq(X, Y) = if and only if Vq(X, Y) = 0. 
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Proof. For a symmetric, positive definite matrix C, let C 1//2 denote the 
symmetric square root of C, that is, C l ^ 2 C 1 / 2 = C . Note that such a square 
root always exists and, moreover, is always positive definite. Now define 
U = A l ' 2 X and V = B l ' 2 Y, and note that \U\ P = \\X\\ A ,p and \V\ q = \\Y\\ B , q . 
Now replace X and Y in the quantities listed in Remark 1 with U and V. 
By the symmetry properties of these norms, the first part of the lemma up 
to just before the last sentence is proved. The last sentence follows from 
the simple observation that U and V are independent if and only if X 
and Y are independent by the positive definiteness of A 1 ! 2 and B 1 ^ 2 . Since 
Vq(X, Y) = if and only if X and Y are independent, we now conclude that 
Vq(X, Y) = if and only if X and Y are independent. The entire lemma now 
follows. □ 

The third initial result involves some nontrivial properties of independent 
components in the finite dimensional setting. Suppose for IgK' 1 and Y G 
W 1 , where p = Pi+P2 and q = q\ + qi , we have 

X=( Xil) +I (2) ) and Y=( Yil hl i2 ^ 



[ x(3 ) J and Y = y y(3) J 

where X^ , X^ G W 1 , X® G , F W , F ( 2 ) G M 91 , y( 3 ) G K 92 ; and suppose 
also that the two vectors X = ([X^f, [X^ff and Y = ([F (2) ] T , [Y( 3) ] T ) T 
are mutually independent and also independent of and Y^\ We have 
the following somewhat surprising result: 

Lemma 3. V (X,Y) = V (X^,Y^). 

Proof. For any t £ l p and s G R q , with t = (tJ,t%) T , s = (s{,s%) T , 
t\ G MP 1 , t<i G W 2 , s± G M qi , and S2 G M 92 , the independence assumptions 
and standard characteristic function properties yield 

\Eexp(i[t T X + s T Y}) - Eexp(it T X)Eexp(is T Y)\ 

= \fx(t)fy(s){Ee X p{i[tJxW + sJyV]) 

- E exp(itj X® ) E exp(isf Y (1) )} | 

= \Eexp(i[tJ X^ + sf Y~ (1) ]) - £exp(iif xW)£exp(isfrW)| 

= l/xw,y( 1 )(*i: s i) ~ /xw^O/yw^i)!- 
Combining this with Theorems 1 and 2 of SR, we obtain that 

Wvv n 1 f l/x(i),y(i)(*i> s i) -/x(i)(*i)/yd)0i)| 2 , 

Note that the right-hand side is invariant with respect to the distributions 
of X and Y and, thus, we can replace X and Y with degenerate random 
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variables fixed at zero. Doing the same on the left-hand side yields the desired 
result. □ 

3. High dimensional extensions. The basic idea we propose is to extend 
the results to Hilbert spaces which can be approximated by sequences of 
finite-dimensional Euclidean spaces. We will give a few examples shortly. 
First, we give the conditions for our results. Assume X is a random variable 
in a Hilbert space Hx with inner produce (v)jc an d norm || -\\x- A super- 
script * will be used to denote adjoint. Say that X is "finitely approximable" 
if there exists a sequence X m G Hx such that for each m > 1, there exists 
a linear map M m : H x \— > MP m for which M m M m is symmetric and positive 
definite on M Pm , p m is nondecreasing, X m = M m (U m ) for some sequence of 
Euclidean random variables U m , and that £7||X m — X||y — >• as m — > oo. 
Note that we can assume that M m M m is the identity without loss of gen- 
erality. This follows since we can always replace U m with U m = A m U m and 
M m with M m = M m A~\ where A m = (M^Mm) 1 / 2 , to yield X m = M m U m 
with M m M m = ^4 ~ 1 ( M m ) A~ 1 being the identity. 

Example 1. Let X be functional data with realizations that are func- 
tions in the Hilbert space Hx = -^[0, 1] consisting of functions / : [0, 1] i— > M. 
satisfying \\f\\x = J f 2 {t)dt < oo. Specifically, we will assume that 



where Z\ , Z2 , . . . are independent random variables with mean zero and vari- 
ance 1, 4>i , (p2 , ■ ■ ■ form an orthonormal basis in ^[0, 1], and Ai,A2,--. are 
fixed constants satisfying Y2i=i A? < 00. This formulation can yield a large 
variety of tight stochastic processes and can be a realistic model for some 
kinds of functional data. 

Let p m = m, U m = (\\Z\, . . . , \ m Z m ) T , and, for any vector a € R Pm , M m (a) = 
YT=i a i<M*)- Clearly, X m = M m (U m ) is in H x almost surely, since ||X m ||x = 
Y^ILi ^jZ? is bounded almost surely. Moreover, for any / G L2[0, 1], it can 
be shown that 



and, thus, M m M m is the identity by the orthonormality of the basis and is 
therefore positive definite. Since A? < 00, 
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oo 

= E ^ 

i=m+l 

asm-)' oo. Thus, X is finitely approximable. 

Example 2. This is basically the same as Example 1, except that we will 
not require the basis functions to be orthogonal. Specifically, let X(t) be as 
given in (1), with the basis functions satisfying f Q <pj(s) ds = 1, for all i > 1, 

but not necessary being mutually orthogonal. Let djj = J* (f)i(s)4>j(s) ds, for 
i,j > 1, and define A m to be the m x m matrix with entry a^j for row i 
and column j for l<i,j<m. Assume that Am is positive definite for each 
m > 1 and also assume that linim^oo YlTj=m+i ^i^j a ij = 0- ^ we now follow 
parallel calculations to those done in Example 1, we can readily deduce that 
with X m = ^2™ =1 \iZi4>i(t), we have M m and defined as before, but 
with M^M m = A m instead of the identity, while E\\X — X m \\^ — > also as 
before. The increased flexibility enlarges the scope of stochastic processes 
achievable to include, for example, Brownian motion. 

Example 3. Let X = (XW,l( 2 ',. . .) T be an infinitely long Euclidean 
vector in £2, that is, ^SiP^*' 1 ] 2 < 00 almost surely; and assume that, after 
permuting the indices if necessary, 

00 

E[X^} 2 ^0, 

i=m+l 

as m — > 00. It is fairly easy to see that if we let X m be a vector with the 
first m elements being identical to the first m elements of X but with all 
remaining elements equal to zero, then E\\X-X m \\ 2 x ->• 0, as m — > 00, and 
all of the remaining conditions for finite approximability are satisfied. This 
example may be applicable to certain high throughput screening settings 
where the vector of measurements may be arbitrarily high-dimensional. 

The following lemma tells us that the range-related properties of Brownian 
distance covariance are preserved for finitely approximable random variables: 



Lemma 4. Assume that X and Y are both finitely approximable random 
variables in Hilbert spaces. Then V n (X,Y), V n (X ), V n (Y), V ( X,Y), V (X) 
and V (Y) are all nonnegative, V n (X,Y) < y/V n (X)V n (Y), V (X,Y) < 
y/V {X)V (Y), andO<R n (X,Y),R (X,Y)<l. 
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Proof. Let X m and Y m be sequences such that E\\X — X m \\ 2 x — > 
and E\\Y — Y m \\ Y — > as m — > oo. Using simple algebra, we can verify that 
Vo(X m ,Y m ) — > Vo(X,Y) which implies Vo(X, Y) > 0. Similar arguments ver- 
ify the desired results for Vq(X), Vq{Y) and Ro(X,Y). Now, for a sam- 
ple of size n, (Xi,Y\), . . . ,(X n ,Y n ), we can create a sequence of samples 
(X lm ,Y lm ), (X nm , Y nm ), such that ^7=i( E \\ X i- X irn\\x+ E \\ Y i- Y irn\\y) -> 
by finite approximability. Let \Q (X, Y) be the same as V n (X, Y) but 
with the mth approximating sample replacing the sample observations. Since 
convergence in mean implies convergence in probability, we can apply basic 

algebra to verify that (X, Y) V n (X, Y) as m — > oo. Similar arguments 
verify the desired results for V n (X), V n (Y) and R n (X, Y), and this completes 
the proof. □ 

Our ultimate goal in this section, however, is to show that Rq(X,Y) has 
the same implications for assessing dependence for finitely approximable 
Hilbert spaces as it does for finite dimensional settings. This is actually 
quite challenging, and we are only able to achieve part of the goal in this 
paper. The following is our first result in this direction: 

Lemma 5. Suppose X and Y are random variables in finitely approx- 
imable Hilbert spaces. Then Rq(X,Y) > implies that X and Y are depen- 
dent. 

Proof. Assume that Rq{X, Y) > but that X and Y are independent. 
By finite approximability, there exists a sequence of paired random variables 
(X m ,Y m ) such that X m and Y m are independent for each m > 0, E\\X — 
x m\\x -> 0, and E\\Y - Y m \\\ -)• 0. This implies that R (X m , Y m ) = for all 
m > 0. Since also Ro(X m ,Y m ) — > Rq(X,Y), we have a contradiction. Hence, 
X and Y are dependent. □ 

If we could also show that Rq(X, Y) = implies independence, we would 
have essentially full homology with the finite dimensional case. It is unclear 
how to show this in general, and it may not even be true in general. However, 
it is certainly true for an interesting special case which we now present. 

Let X and Y be random variables in finitely approximable Hilbert spaces. 
Suppose there exists linear maps M : Hx | — > Hx and N : Hy Hy with 
adjoints for which both M*M and iV*./V are identities, and that MX = 
Xi + X 2 and NY = Y x + Y 2 , where Xi £ and Yi G and 

(2) 

Hy are finite-dimensional subspaces of Hx and Hy, respectively, and that 
X 2 and Y 2 are mutually independent and independent of (Xi,Y\). We will 
call a random pair (X, Y) that satisfies these conditions "at most finitely 
dependent." For example, paired functional data (X,Y) could be at most 
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finitely dependent if all possible dependencies between the two populations 
X and Y are attributable to at most a few principle functions (or principle 
components) in each population and that the remaining components are 
independent noise. 

Example 4. Suppose that we are interested in determining whether X 
and Y are independent, where X is either a functional observation or some 
other very high dimensional observation and Y is a continuous outcome of 
interest such as a time to an event. Suppose also that X is finitely approx- 
imable and that any potential dependence of Y on X is solely due to a latent 
set of finite principle components of X. Such a pair (X, Y) would be at most 
finitely dependent. 

The following lemma on finitely dependent data is the final result of this 
section: 

Lemma 6. Suppose that X and Y are finitely approximable random vari- 
ables in Hilbert spaces and that (X,Y) is at most finitely dependent. Then 
Rq(X,Y) > and the inequality is strict if and only if X and Y are depen- 
dent. 

Proof. Note first that ||MX|||- = (MX, MX) X = {M*MX, X) x = {X, 
X)x = H^llx and, similarly, ||iVY||y = ||Y||y- Since Rq(X,Y) is a func- 
tion involving only the norms of X and Y, we can assume without loss of 
generality that N and M are identities. Thus, we will simply assume that 
X = X\ + X2 and Y = Y\ + Y2 hereafter. Let (X2 m ,Y2m) be a sequence of 
paired random variables in H x x Hy such that -EH^ — X2 m ||jc ~~ ^ an d 
-E||Y2 — }2 m ||y — > 0, and where, for each m > 1, X2 m an d Y2 m are mutually 
independent and also independent of (X\,Y\). 

Now let X m = X\ + X2m and Y m = Y\-\- Y2 m , and note that both X m and 
Y m are finite dimensional with Ro(X m ,Y m ) — > Ro(X, Y). Let p\ and q\ be the 
respective dimensions of X\ and Y\, p2 m and q2 m be the respective dimen- 
sions of X 2m and Y 2m , and let p m = Pi+P2m and q m = q\ + q 2m - Let X^ be 
the projection of X2 m onto H% , l^m ^ e ^ ne projection of Y2 m onto Hy , and 
let X^l = X 2m - x!£l and Yj® = Y" 2 m - Yi^m ■ B y tne finite-dimensionality of 
X\ , X 2m , Yi and Y 2m , there exists linear maps A\ : W 1 h-> Hjp , A 2m : R P2m h-> 
H X ,B 1 : R" 1 \-> H ( y ] , and B 2m : R q2m ^ Hy , such that A\A X , A* 2m A 2 m , B* -Bi 
and -B| m i?2m are all identities and that X\ = A\U\, X^ = AiU^, X^ = 
MmUzm, Yi = BiZx, Y^l = B 1 Z%1, and Yj® = B 2ra ^, for random vec- 
tors e R pi , c/ 2 ( 2 e RP2m > ^i^iil e M<n > and Z S e R<?2m > where 
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u 2m = Wiy, [u^ff and Z2m = ([z w ] t ) [z m ]T) T are mutually inde . 

pendent and independent of (£7i,Zi). 

If we let U m = m + U^f, [uSlff and Z m = {\Z X + Z^f , [Z^ff, 
the above formulation yields that ||X m ||x = \U m \ Pm and ||i^n||y = \Z m \ qm . 
By Lemma 3, we now have that Ro(U m , Z m ) = Rq(Ux, Z±) which does not 
depend on m. Since ^4J^4i and B\B\ are both identities, we also have that 
RoiUuZi) = Ro{X r ,Y x ) and, thus, R Q (X m ,Y m ) = R (U m ,Z m ) -> R (X 1 ,Y 1 ), 
as m — > oo. This now implies that Ro(X,Y) = Rq(Xi,Y%), which yields the 
desired result. □ 

4. Increasing power. We now briefly discuss the issue of power of tests 
based on R n {X,Y). By Lemma 2, we observe that there are many different 
versions of the statistic R n (X,Y), based on different choices of matrices 
A and B in the norms || • \\a,p and || • \\B,q, that all have the ability to 
assess general dependence. Is it possible to choose A and B in a way that 
provides optimal power for certain fixed or contiguous alternatives? The 
answer should be yes since it appears that A and B could potentially be 
selected to emphasize dependence for certain subcomponents of X and Y 
while de-emphasizing dependence for other subcomponents. The answer to 
this question, unfortunately, seems to be very hard to pin down rigorously. 
We do not pursue this further here, but it does seem to be a potentially 
important issue that deserves further attention. 

5. Discussion. We have briefly proposed two generalizations of the Brow- 
nian distance covariance, one based on alternative norms to Euclidean norms, 
and the other based on infinite dimensional data. The first generalization 
raises the possibility of fine-tuning the statistics proposed in SR to increase 
power, and the second generalization opens the door for applicability of the 
results in SR to a broader array of data types, including infinite dimen- 
sional data and data with dimension increasing with sample size. However, 
for both of these generalizations, there remain many open questions that 
could lead to important further improvements. In either case, the results of 
SR are very important both practically and theoretically and should result 
in many important future developments in both the application and theory 
of statistics. 
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